INDEX
Explanations
references to agitation or frustration
New Auto-Interp
Negative Logits
behavi
-0.75
untu
-0.73
¥ŀ
-0.71
ciating
-0.69
quished
-0.69
terior
-0.68
zac
-0.67
agonist
-0.67
sterdam
-0.66
issance
-0.64
POSITIVE LOGITS
fork
1.05
imaru
1.01
icago
0.82
IELD
0.81
y
0.80
cock
0.77
itch
0.77
Weaver
0.72
Cobb
0.71
acus
0.68
Activations Density 0.028%