INDEX
Explanations
names of individuals and entities along with their associated roles or titles
New Auto-Interp
Negative Logits
ìļ°
-0.17
Äįan
-0.15
/fs
-0.14
Naz
-0.14
puties
-0.14
arine
-0.14
amina
-0.14
subpoena
-0.14
ourg
-0.13
ết
-0.13
POSITIVE LOGITS
196
0.24
197
0.20
195
0.18
198
0.17
Laugh
0.16
ï¼ĪæĺŃåĴĮ
0.16
Lands
0.15
0.15
independently
0.15
USSR
0.15
Activations Density 0.234%