INDEX
Explanations
instances of verbs that signify the addition or inclusion of information
New Auto-Interp
Negative Logits
etre
-0.18
ÑĢеб
-0.15
hete
-0.14
rette
-0.14
enter
-0.14
strup
-0.14
Tre
-0.13
-l
-0.13
æĬ
-0.13
icina
-0.13
POSITIVE LOGITS
igham
0.17
uce
0.16
iction
0.14
chluss
0.14
ict
0.14
OLON
0.13
icted
0.13
olon
0.13
IEL
0.13
onical
0.13
Activations Density 0.018%