INDEX
Explanations
instances where the word "doesn't" is included
negations or words indicating absence
New Auto-Interp
Negative Logits
lehem
-0.77
odon
-0.70
igers
-0.65
ende
-0.65
udo
-0.63
chin
-0.63
zel
-0.63
ghazi
-0.63
iger
-0.63
fall
-0.62
POSITIVE LOGITS
cean
0.81
charact
0.72
iggurat
0.67
atell
0.65
corrid
0.65
ospons
0.62
axis
0.62
hyde
0.62
ajo
0.62
ãĢİ
0.61
Activations Density 0.000%