INDEX
Explanations
phrases related to expressing opinions or making decisions
phrases indicating the need for action or change
New Auto-Interp
Negative Logits
ady
-0.73
izens
-0.65
izen
-0.63
emaker
-0.61
eny
-0.61
afe
-0.59
owe
-0.59
raft
-0.58
ãĤ´ãĥ³
-0.57
ower
-0.57
POSITIVE LOGITS
yeah
1.06
huh
0.97
namely
0.92
etc
0.89
sir
0.86
blah
0.86
â̦"
0.85
whereas
0.85
maybe
0.83
[
0.82
Activations Density 0.447%