INDEX
Explanations
mention of specific individuals or names
New Auto-Interp
Negative Logits
ILLA
-0.19
illas
-0.18
illa
-0.16
cia
-0.16
loo
-0.15
à¤Ī
-0.15
lassen
-0.15
olik
-0.14
wy
-0.14
ý
-0.14
POSITIVE LOGITS
pty
0.16
agine
0.16
self
0.16
SELF
0.15
ernals
0.15
ags
0.15
asurable
0.15
anning
0.15
pter
0.15
alytics
0.14
Activations Density 0.015%