INDEX
Explanations
names or words related to political figures or entities
references to individuals or entities characterized by the term "erg."
New Auto-Interp
Negative Logits
İĭ
-0.85
Mub
-0.74
going
-0.68
Fram
-0.67
©¶æ¥µ
-0.63
ften
-0.62
ĺħ
-0.62
fty
-0.62
LOAD
-0.62
creen
-0.61
POSITIVE LOGITS
roup
0.96
ensen
0.92
raham
0.90
ling
0.89
reen
0.89
onomic
0.87
raphic
0.87
ues
0.85
roups
0.84
lings
0.83
Activations Density 0.016%