INDEX
Explanations
terms related to significant actions or decisions being made
references to significant actions or decisions
New Auto-Interp
Negative Logits
omial
-0.73
sqor
-0.70
etheless
-0.67
aez
-0.67
vae
-0.66
english
-0.64
sung
-0.63
aylor
-0.63
oola
-0.63
Koran
-0.62
POSITIVE LOGITS
toward
0.94
towards
0.93
able
0.85
ments
0.84
backs
0.82
over
0.76
igraph
0.73
rers
0.71
abouts
0.71
decisively
0.71
Activations Density 0.032%