INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
tf
-0.74
py
-0.70
Lenin
-0.69
Self
-0.63
firsthand
-0.62
proceeds
-0.62
encing
-0.62
iste
-0.61
Already
-0.61
boxing
-0.60
POSITIVE LOGITS
eatures
0.79
Hawk
0.72
actionDate
0.68
oun
0.64
Magnum
0.61
Tempest
0.60
unct
0.60
Naj
0.60
Ori
0.60
Accuracy
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.