INDEX
Explanations
references to beliefs or opinions
repeated use of the word "the"
New Auto-Interp
Negative Logits
ibaba
-0.78
ãĤ´ãĥ³
-0.75
imi
-0.73
bg
-0.73
icia
-0.71
strate
-0.70
Alert
-0.69
plete
-0.69
ftime
-0.69
ategory
-0.68
POSITIVE LOGITS
slightest
1.20
majority
1.09
entire
1.05
latter
0.98
vast
0.97
greatest
0.97
absence
0.97
whole
0.94
biggest
0.92
Russians
0.92
Activations Density 0.390%