INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
oshenko
-0.78
tones
-0.78
cius
-0.76
abilia
-0.73
cdn
-0.71
Ukraine
-0.71
nos
-0.70
vt
-0.68
forum
-0.67
auer
-0.65
POSITIVE LOGITS
destro
0.68
ppa
0.67
bulldo
0.62
Luffy
0.61
Franch
0.60
gers
0.60
scrub
0.60
aylor
0.59
ppo
0.59
Viol
0.56
Activations Density 0.000%
No Known Activations
This feature has no known activations.