INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
anto
-0.90
iw
-0.85
igr
-0.78
udeb
-0.78
jew
-0.75
displayText
-0.75
achi
-0.74
istine
-0.74
ander
-0.72
FOX
-0.72
POSITIVE LOGITS
Persona
0.65
Petr
0.64
Presence
0.64
airspace
0.62
compos
0.62
senal
0.62
etheless
0.61
Candle
0.61
Masquerade
0.60
Byz
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.