INDEX
Explanations
references to teams or groups of people
New Auto-Interp
Negative Logits
ervo
-0.18
dana
-0.17
heid
-0.16
EEK
-0.16
oÄŁ
-0.16
èĬ¯
-0.16
Tales
-0.15
Inspir
-0.15
ÑĥлÑİ
-0.15
.tf
-0.14
POSITIVE LOGITS
neutr
0.16
ä¼
0.15
wash
0.15
dich
0.14
ampp
0.14
bon
0.14
hel
0.14
Wallace
0.14
aml
0.14
sheet
0.14
Activations Density 0.034%