INDEX
Explanations
proper nouns and names, specifically focusing on those related to politics, influence, and power
words and phrases related to specific individuals or groups, particularly in complex or negative contexts
New Auto-Interp
Negative Logits
bourg
-0.71
opers
-0.58
ograp
-0.58
ppers
-0.58
ikh
-0.58
gears
-0.57
ngth
-0.57
oshenko
-0.57
ppa
-0.56
suppose
-0.55
POSITIVE LOGITS
uary
0.75
ÃĽ
0.74
heid
0.72
afia
0.70
ILA
0.70
eki
0.69
uration
0.69
SourceFile
0.68
dash
0.67
lie
0.65
Activations Density 0.058%