INDEX
Explanations
mentions of political figures or entities
repeated references to a specific individual, particularly in a political context
New Auto-Interp
Negative Logits
mathemat
-0.82
fortun
-0.79
Jericho
-0.75
Yor
-0.72
Franch
-0.69
interf
-0.68
cob
-0.67
Brist
-0.67
Tid
-0.67
airs
-0.65
POSITIVE LOGITS
Ļ
1.53
¬
1.46
ı
1.29
Ħ¢
1.28
ħ
1.23
ij
1.23
¡
1.20
Ĵ
1.20
Ķ
1.19
¤
1.19
Activations Density 0.256%