INDEX
Explanations
phrases related to political figures or political actions
words related to a specific cultural or regional identity
New Auto-Interp
Negative Logits
ques
-0.62
ICES
-0.61
OME
-0.61
textbooks
-0.60
Seymour
-0.58
gyn
-0.58
ãĥ¡
-0.58
Mods
-0.57
attendance
-0.57
Brexit
-0.55
POSITIVE LOGITS
lasses
1.29
nir
1.12
regate
1.03
aroo
1.01
sa
0.99
sung
0.99
oing
0.98
fu
0.98
sv
0.96
sten
0.90
Activations Density 0.049%