INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
scrut
-0.85
verty
-0.79
IFA
-0.76
earch
-0.73
ionage
-0.72
Ethiop
-0.72
fundament
-0.72
FTWARE
-0.69
unlaw
-0.67
ILL
-0.66
POSITIVE LOGITS
ga
0.77
nation
0.72
god
0.70
flo
0.67
reb
0.65
knock
0.64
atl
0.64
nam
0.64
gan
0.63
atra
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.