INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
irted
-0.68
uctor
-0.66
ileged
-0.65
agi
-0.65
agogue
-0.65
rious
-0.64
atted
-0.64
Presents
-0.64
icultural
-0.63
inatory
-0.63
POSITIVE LOGITS
emet
0.66
Dynam
0.63
slow
0.61
Xer
0.61
sword
0.60
dies
0.59
Nusra
0.58
Hannibal
0.58
heid
0.58
Edwin
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.