INDEX
Explanations
instances of high activation punctuation or specific phrases related to significant news events or statements
New Auto-Interp
Negative Logits
men
-0.59
agi
-0.55
that
-0.54
trate
-0.53
mà
-0.48
Vaid
-0.48
бина
-0.47
ाल
-0.47
abler
-0.47
FORMANCE
-0.47
POSITIVE LOGITS
Roskov
0.77
tvguidetime
0.69
principalColumn
0.66
0.66
存于互联网档案馆
0.64
tartalomajánló
0.63
[]).
0.60
bestos
0.58
verläs
0.58
Мексичка
0.58
Activations Density 0.067%