INDEX
Explanations
words related to political figures
New Auto-Interp
Negative Logits
liness
-0.98
ly
-0.94
LY
-0.91
manship
-0.91
lies
-0.85
lers
-0.79
lessly
-0.78
lines
-0.77
leness
-0.76
land
-0.75
POSITIVE LOGITS
uthor
1.14
ibaba
1.02
emia
0.95
qua
0.84
Pradesh
0.83
Devi
0.83
isa
0.82
plings
0.81
Haram
0.81
emon
0.81
Activations Density 0.023%