INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Ames
-0.79
hesda
-0.70
------------------------------------------------
-0.70
indemn
-0.68
Bey
-0.67
ukong
-0.64
alse
-0.64
Aval
-0.62
Arche
-0.60
Lowell
-0.60
POSITIVE LOGITS
oir
0.66
rack
0.64
SG
0.62
puberty
0.61
orr
0.61
oga
0.61
wagen
0.61
çİĭ
0.60
gorilla
0.59
GBT
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.