INDEX
Explanations
phrases related to actions accompanied by consequences or reactions
instances of reported actions or statements from the subject about threats and requests
New Auto-Interp
Negative Logits
llah
-0.67
cation
-0.65
isal
-0.64
etheless
-0.63
mania
-0.62
aml
-0.62
panic
-0.62
illo
-0.62
culus
-0.61
lance
-0.58
POSITIVE LOGITS
themselves
1.07
selves
0.89
selves
0.77
MpServer
0.68
helmets
0.67
mouths
0.66
uniforms
0.64
microphones
0.60
orbits
0.60
jointly
0.60
Activations Density 0.834%