INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
bed
-0.66
gery
-0.66
abad
-0.64
¯¯¯¯¯¯¯¯
-0.64
--------------------
-0.64
Protocol
-0.63
stroke
-0.61
USE
-0.60
========
-0.59
suscept
-0.59
POSITIVE LOGITS
rique
0.67
ategories
0.65
Kro
0.65
umo
0.64
ending
0.63
ategory
0.63
¨
0.62
heimer
0.61
stic
0.61
WI
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.