INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
roxy
-0.91
ibaba
-0.84
ruits
-0.76
Downloadha
-0.75
renheit
-0.74
rious
-0.74
mares
-0.72
rontal
-0.72
udos
-0.70
rices
-0.70
POSITIVE LOGITS
athe
0.67
iege
0.63
cry
0.63
Warden
0.62
esan
0.61
eteria
0.61
reformed
0.60
stamp
0.60
imprisonment
0.58
apartheid
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.