drshapeless


Generate c++ function definition in Emacs with tree-sitter

Tags: emacs | c++

Create: 2024-09-28, Update: 2024-09-28

No tool with treesit

Out of my surprise, there is no available package to generate c++function definition from declaration.

I thought it would be a very trival thing in C++ tooling.

There was one based on an ancient package semantic-mode. I have no idea what that is.

https://github.com/tuhdo/semantic-refactor

It is not very useful in modern Emacs with lsp and treesit support.

Do it myself

I guess I have to do it myself.

I already have a half working version when I was writing C a while ago.

Making a C++ one should be not that hard, right?

Step by step

  1. Use treesit-explore-mode to investigate header.

  2. Search for the outermost scope, including struct, class, namespace, template.

  3. Substitute the scope names from outside to inside in front of function name, separate with double colon.

  4. Put the result string into kill-ring or in the source file.

  5. Profit???

treesit-explore-mode

This is a built-in function from Emacs 29 treesit support. It allows you to, well, explore the code from a treesit point of view.

Search for the outermost scope

The key function would be treesit-parent-until, it searches the node parent until a predicate returns true. Another function to use would be treesit-parent-while, but in order to capture the scope name in one go, using until would be a better idea.

Let's write a function to return the prefix of the function.

Let's take my attempt on making a darray as example.

namespace shapeless {
namespace container {

template <typename T>
struct Darray {
public:
    T *data;

    void init(mem::Allocator *allocator);
    void release();
    void resize(size_t size);
    void push(T item);
    T pop();
    void clear();
    size_t size();

private:
    size_t capacity;
    size_t length;

    mem::Allocator *allocator;
};
}
}

To have a full blown definition of the init function, we have to prefix the function like this.

template <typename T>
void shapeless::container::Darray<T>::init(shapeless::mem::Allocator *allocator) {

  }

What the fuck, that would be too long. Another issue is that, the mem namespace is in shapeless namespace, we might know that, but that would be so much work. Let's forget about this approach.

I would rather make two namespace scope also in the definition.

namespace shapeless {
namespace container {
template <typename T>
void Darray<T>::init(mem::Allocator *allocator) {
}
}
}

Forget about the namespace

Now we only have to care about two thing, class, (including struct, they are essentially the same thing), and template.

Let's make some helper functions.

(defun drsl/get-cpp-template-node (class-node)
  "Return parent template treesit node.

Return nil if is not in a template."
  (treesit-parent-until class-node
                        (lambda (NODE)
                          (string-equal (treesit-node-type NODE)
                                        "template_declaration"))))

(defun drsl/get-class-function-node-at-point ()
  "Return a treesit node of the current class function."
  (treesit-parent-until (treesit-node-at (point))
                        (lambda (NODE)
                          (string-equal (treesit-node-type NODE)
                                        "field_declaration"))
                        t))

(defun drsl/get-cpp-class-node-at-point ()
  "Return the current class treesit node."
  (treesit-parent-until (treesit-node-at (point))
                        (lambda (NODE)
                          (let ((NODE-TYPE (treesit-node-type NODE)))
                            (or (string-equal NODE-TYPE
                                              "class_specifier")
                                (string-equal NODE-TYPE
                                              "struct_specifier"))))
                        t))

Put together

(defun drsl/generate-cpp-class-function-definition-at-point ()
  "Return the class function definition at point."
  (interactive)
  (string-replace
   ";"
   " {\n\n}"
   (let* ((class-node (drsl/get-cpp-class-node-at-point))
          (func-node  (drsl/get-class-function-node-at-point))
          (template-node (drsl/get-cpp-template-node class-node))
          (class-text (treesit-node-text
                       (treesit-node-child-by-field-name
                        class-node
                        "name")
                       t))
          (func-text (treesit-node-text
                      func-node
                      t))
          (first-space-pos (string-match " "
                                         func-text))
          (insert-pos (string-match "[a-z]"
                                    func-text
                                    first-space-pos))         )
     (if template-node
         (let* ((template-parameter (treesit-node-text
                                     (treesit-node-child-by-field-name
                                      template-node
                                      "parameters")
                                     t))
                (template-head (concat "template "
                                       template-parameter
                                       "\n"))

                )
           (concat template-head
                   (substring func-text 0 insert-pos)
                   class-text
                   (string-replace "typename " "" template-parameter)
                   "::"
                   (substring func-text insert-pos))
           )

       (concat (substring func-text 0 insert-pos)
               class-text
               "::"
               (substring func-text insert-pos))))))

Here I choose to do some regexp operation to find the correct insert position in the function string.

See this example.

class Bar {
    void *foo();
    void bello();
};

Here is what we see in treesit-explore-mode

(translation_unit
 (class_specifier class name: (type_identifier)
  body:
   (field_declaration_list {
    (field_declaration type: (primitive_type)
     declarator:
      (pointer_declarator *
       declarator:
        (function_declarator declarator: (field_identifier)
         parameters: (parameter_list ( ))))
     ;)
    (field_declaration type: (primitive_type)
     declarator:
      (function_declarator declarator: (field_identifier)
       parameters: (parameter_list ( )))
     ;)
    }))
 ;)

When there is a pointer in front of the function. The function will be encapsulated in a pointer. When there are multiple pointers in front of it, it will be further encapsulated.

This is insane. Unless I use a recursion of treesit-parent-until to count the pointer, there would not be a good solution of using a pure treesit node operation.

I gave up and use a regular regexp to find the first a-z character after the first space. It works perfectly fine.

Improvement

The function is not perfect now. When a function is suffix with override in its definition, the override keyword will also be copied, which is not ideal.

But I have enough of it, it is working at the moment.

But to really use it, you have to put the generation function in another interactive function to for example put it in kill-ring. I am not implementing auto-paste in corresponding .cpp file.

Full code is here.