INDEX
Explanations
phrases related to consequences or predictions
phrases related to negative consequences or warnings
New Auto-Interp
Negative Logits
reception
-0.64
Paran
-0.63
Malays
-0.62
Lithuan
-0.62
Lindsey
-0.62
Ivy
-0.61
Malaysian
-0.61
Monteneg
-0.60
htaking
-0.58
Zika
-0.58
POSITIVE LOGITS
automatically
0.86
suddenly
0.79
Ł
0.76
indistinguishable
0.74
İ
0.71
disappears
0.70
Untitled
0.70
spontaneously
0.70
becomes
0.70
magically
0.69
Activations Density 0.564%