INDEX
Explanations
words and classes related to programming constructs and types in a large codebase
New Auto-Interp
Negative Logits
ſelves
-0.95
itſelf
-0.86
myſelf
-0.85
ſelf
-0.83
ſtate
-0.83
pleaſure
-0.82
fevere
-0.80
Majefty
-0.79
ſever
-0.79
houſe
-0.79
POSITIVE LOGITS
[]
0.67
aux
0.56
i
0.56
itu
0.54
inex
0.54
p
0.52
Nature
0.51
nature
0.51
ós
0.50
Gön
0.49
Activations Density 0.308%