INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ingen
-0.75
tox
-0.69
iii
-0.67
char
-0.65
otent
-0.64
udic
-0.64
vind
-0.64
igne
-0.62
ii
-0.62
dam
-0.61
POSITIVE LOGITS
range
1.81
Range
1.39
ranges
1.18
range
1.11
relative
0.89
Walters
0.81
lower
0.78
Lower
0.76
Range
0.75
orsi
0.74
Activations Density 0.000%
No Known Activations
This feature has no known activations.