INDEX
Explanations
references to spreading rumors or unfounded claims
phrases indicating rumors or accusations
New Auto-Interp
Negative Logits
atre
-0.72
borg
-0.66
pling
-0.65
onomic
-0.62
ocks
-0.62
ien
-0.61
ouk
-0.61
osures
-0.61
waters
-0.61
ey
-0.60
POSITIVE LOGITS
accompanies
0.98
soever
0.92
arose
0.87
preceded
0.84
they
0.77
contradicts
0.77
©¶æ
0.76
accompanied
0.75
contradicted
0.73
surrounds
0.72
Activations Density 0.192%