INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
sacrific
-0.73
destro
-0.73
aimon
-0.71
iverpool
-0.68
rup
-0.66
exting
-0.65
anism
-0.64
ministic
-0.64
proble
-0.62
umenthal
-0.62
POSITIVE LOGITS
ĸļ
0.85
drawn
0.76
plane
0.66
cousins
0.64
proven
0.60
developed
0.59
been
0.59
enance
0.59
fewer
0.59
nothing
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.