INDEX
Explanations
names containing "Anwar"
mentions of war-related terms
New Auto-Interp
Negative Logits
essee
-0.90
aminer
-0.87
sembly
-0.86
ĸļ
-0.80
choes
-0.74
ocre
-0.71
suspic
-0.69
obook
-0.68
URES
-0.68
iquid
-0.66
POSITIVE LOGITS
rior
1.67
riors
1.56
fare
1.46
ring
0.99
lords
0.94
lord
0.93
locks
0.92
war
0.90
lock
0.88
ped
0.87
Activations Density 0.018%