INDEX
Explanations
conjunctions and transition phrases
New Auto-Interp
Negative Logits
oulos
-0.18
_RD
-0.15
ede
-0.15
los
-0.14
Hitch
-0.14
otos
-0.14
LW
-0.14
rones
-0.13
parties
-0.13
271
-0.13
POSITIVE LOGITS
ake
0.15
:eq
0.15
ίο
0.15
umbo
0.15
Retorna
0.15
ACHER
0.14
oen
0.14
ximity
0.14
ecure
0.14
ULO
0.14
Activations Density 0.001%