INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
idan
-0.91
jen
-0.84
ÙĴ
-0.80
Faul
-0.77
forth
-0.76
lees
-0.72
channelAvailability
-0.71
inators
-0.69
Hort
-0.68
hound
-0.67
POSITIVE LOGITS
eln
0.69
Mansion
0.67
adjud
0.65
polic
0.62
Assassin
0.61
opp
0.60
opinion
0.60
chwitz
0.60
magistrate
0.59
olicy
0.59
Activations Density 0.000%
No Known Activations
This feature has no known activations.