INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
digestion
-0.74
peat
-0.70
bandwagon
-0.68
meat
-0.64
machines
-0.62
licenses
-0.62
expansions
-0.61
hiber
-0.60
ktop
-0.60
hare
-0.59
POSITIVE LOGITS
oshop
0.84
uran
0.74
ndra
0.74
izophren
0.69
̶
0.69
erman
0.68
:=
0.68
oman
0.67
auder
0.67
isconsin
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.