INDEX
Explanations
phrases related to some kind of political reporting or commentary
New Auto-Interp
Negative Logits
comprom
-0.68
oint
-0.61
ONSORED
-0.57
nib
-0.56
toxin
-0.55
respons
-0.54
fib
-0.52
ãĥ¥
-0.52
direct
-0.52
Tamil
-0.52
POSITIVE LOGITS
abouts
1.34
upon
1.18
FORE
0.97
fore
0.96
after
0.91
are
0.86
Goes
0.86
ain
0.84
Were
0.83
buquerque
0.83
Activations Density 0.092%