INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Zin
-0.67
NPR
-0.67
dug
-0.66
Omn
-0.64
washed
-0.64
levision
-0.63
alach
-0.63
olls
-0.61
IPM
-0.60
writer
-0.60
POSITIVE LOGITS
ingen
0.87
icit
0.70
uble
0.69
agogue
0.67
inge
0.67
amphetamine
0.66
igure
0.65
mberg
0.63
uces
0.63
ibaba
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.