INDEX
Explanations
references to specific individuals or groups involved in research or artistic endeavors
New Auto-Interp
Negative Logits
ardu
-0.15
esto
-0.15
mall
-0.14
COPE
-0.14
Sachs
-0.14
Commonwealth
-0.14
lemn
-0.14
SCI
-0.14
outu
-0.14
eder
-0.13
POSITIVE LOGITS
bean
0.17
para
0.15
-sama
0.15
zeug
0.15
Mitar
0.14
pread
0.14
intern
0.14
ÏĢÏģÏĮ
0.14
umpy
0.14
æ°´å¹³
0.14
Activations Density 0.060%