INDEX
Explanations
expressions of inability or difficulty
New Auto-Interp
Negative Logits
inator
-0.17
akov
-0.15
vo
-0.15
ảy
-0.15
Pornhub
-0.15
ichert
-0.14
ularity
-0.14
doors
-0.14
auc
-0.14
lol
-0.14
POSITIVE LOGITS
beat
0.36
resist
0.33
Beat
0.28
resisting
0.27
Resist
0.27
beating
0.26
Beat
0.26
resistance
0.26
beats
0.25
Resistance
0.24
Activations Density 0.043%