INDEX
Explanations
references to decisions or actions made by individuals or groups
occurrences of the word "move" indicating actions or decisions
New Auto-Interp
Negative Logits
sqor
-0.79
omial
-0.75
english
-0.67
Koran
-0.63
sung
-0.61
etheless
-0.60
inges
-0.59
risen
-0.59
oola
-0.58
icum
-0.57
POSITIVE LOGITS
backs
0.83
ments
0.82
able
0.82
over
0.81
ment
0.81
toward
0.77
overs
0.76
towards
0.76
brates
0.76
rers
0.73
Activations Density 0.031%