INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
COUR
-0.84
ificial
-0.83
Ambro
-0.78
Atmosp
-0.69
idences
-0.67
AUD
-0.67
bourg
-0.66
utics
-0.66
meyer
-0.66
uberty
-0.64
POSITIVE LOGITS
odan
0.66
Fighters
0.62
raid
0.60
Fighter
0.59
errilla
0.59
preferring
0.58
Hitman
0.58
ascus
0.57
checkpoint
0.57
Saddam
0.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.