INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
proport
-0.73
»Ĵ
-0.72
chwitz
-0.67
classmates
-0.67
reconc
-0.66
ptive
-0.63
ima
-0.62
Intake
-0.62
olars
-0.61
destro
-0.60
POSITIVE LOGITS
cover
0.70
sleeper
0.69
ard
0.66
bidden
0.65
itary
0.63
gal
0.61
bull
0.61
ALE
0.59
WARN
0.59
iesta
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.