INDEX
Explanations
names or terms related to specified individuals or projects
instances of a specific name and its variations, especially related to legal or news contexts
New Auto-Interp
Negative Logits
vill
-0.69
Witch
-0.68
venge
-0.68
agg
-0.67
Wolves
-0.66
Spo
-0.65
war
-0.65
glass
-0.64
Sor
-0.64
War
-0.63
POSITIVE LOGITS
ERT
4.03
redo
1.44
Bold
1.44
flagged
1.05
ERG
1.03
OTS
1.02
RSA
0.99
ert
0.91
ERN
0.88
IRED
0.87
Activations Density 0.021%