INDEX
Explanations
verbs that indicate taking a specific action or making a specific decision
references to decisions and their implications
New Auto-Interp
Negative Logits
staff
-0.65
enne
-0.61
Fact
-0.60
ennes
-0.60
umb
-0.59
created
-0.58
hn
-0.58
rising
-0.57
verages
-0.56
opened
-0.55
POSITIVE LOGITS
ylum
0.73
mistake
0.67
olate
0.66
disappear
0.65
ãĥĥãĤ¯
0.65
vow
0.65
kish
0.65
impression
0.64
vanish
0.64
contribution
0.63
Activations Density 0.193%