INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
mortal
-0.75
unlucky
-0.70
foul
-0.66
killer
-0.65
filthy
-0.64
submission
-0.64
strike
-0.63
coworkers
-0.62
unmanned
-0.62
hob
-0.62
POSITIVE LOGITS
ĸļ
1.11
arov
0.78
borough
0.76
Ban
0.73
toc
0.72
OUP
0.70
ONT
0.70
Pin
0.70
Allows
0.69
aret
0.69
Activations Density 0.000%
No Known Activations
This feature has no known activations.