INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
aughs
-0.77
arms
-0.75
grips
-0.74
latex
-0.73
Ashton
-0.72
Lis
-0.69
Celt
-0.69
apons
-0.67
Portug
-0.67
Guer
-0.66
POSITIVE LOGITS
cool
0.90
fun
0.87
no
0.86
few
0.84
don
0.84
just
0.84
whe
0.81
ev
0.78
yet
0.76
altern
0.75
Activations Density 0.000%
No Known Activations
This feature has no known activations.