INDEX
Explanations
conjunctions and variations of the word "and."
New Auto-Interp
Negative Logits
iler
-0.17
ooks
-0.16
pri
-0.15
eln
-0.14
urt
-0.14
ä»¶
-0.14
ugin
-0.14
mers
-0.14
ahan
-0.14
UGIN
-0.14
POSITIVE LOGITS
imity
0.15
arty
0.14
176
0.14
orts
0.14
rary
0.14
rez
0.14
azi
0.14
lector
0.13
wich
0.13
Linear
0.13
Activations Density 0.139%