INDEX
Explanations
names and terms related to various individuals or entities
references to specific individuals or positions of authority related to political or organizational contexts
New Auto-Interp
Negative Logits
æ©
-0.83
ãĤ¨ãĥ«
-0.75
straw
-0.72
=-=-=-=-
-0.71
çīĪ
-0.70
اÙĦ
-0.70
Takeru
-0.69
å£
-0.69
ãĤ¼ãĤ¦ãĤ¹
-0.68
EVENTS
-0.66
POSITIVE LOGITS
rians
0.86
closure
0.86
omaly
0.86
ression
0.78
rius
0.76
Syn
0.75
rity
0.75
uve
0.73
ury
0.73
rian
0.73
Activations Density 0.060%