INDEX
Explanations
statements with strong opinions or beliefs
instances of authority and response situations
New Auto-Interp
Negative Logits
Reynolds
-0.65
Vu
-0.63
Block
-0.62
valve
-0.60
ãĤ¬
-0.60
ilon
-0.59
indexes
-0.58
Bell
-0.57
Bever
-0.57
olin
-0.57
POSITIVE LOGITS
suddenly
0.90
uddenly
0.80
?!"
0.78
disrespect
0.76
blatantly
0.74
someone
0.72
ocious
0.72
?!
0.70
ammed
0.70
ebook
0.70
Activations Density 1.098%