INDEX
Explanations
prepositions and conjunctions indicating relationships or actions
words and phrases indicating conditions or exceptions in reasoning
New Auto-Interp
Negative Logits
sonian
-0.75
anger
-0.74
aban
-0.70
DAQ
-0.69
obin
-0.66
ouf
-0.63
anamo
-0.62
istani
-0.60
AMI
-0.60
bernatorial
-0.59
POSITIVE LOGITS
ĨĴ
0.64
majority
0.62
jong
0.61
"{0.60
Ħ¢
0.59
ideo
0.57
spite
0.56
©¶æ
0.54
+(
0.54
SOME
0.53
Activations Density 0.246%