INDEX
Explanations
mentions of a specific geographic location, potentially related to conflict or political figures
instances of a specific name or term related to individuals or entities in the text
New Auto-Interp
Negative Logits
lishing
-0.84
ertodd
-0.78
lier
-0.78
ridges
-0.78
itud
-0.77
ãĥ¼ãĥ«
-0.76
theless
-0.75
TON
-0.74
fully
-0.74
ting
-0.71
POSITIVE LOGITS
Äĩ
1.10
ya
0.88
emi
0.80
ye
0.80
ennes
0.80
plom
0.79
yah
0.75
alez
0.72
oun
0.72
arbon
0.72
Activations Density 0.029%