INDEX
Explanations
verbs related to concrete actions or knowledge acquisition
New Auto-Interp
Negative Logits
boa
-0.68
otion
-0.63
ŃĶ
-0.62
otos
-0.62
erva
-0.61
aminer
-0.60
enda
-0.59
rid
-0.58
ÅĤ
-0.58
Dj
-0.57
POSITIVE LOGITS
lege
0.57
cut
0.54
beforehand
0.54
ledged
0.53
Aware
0.52
ãĥĨ
0.51
Benefit
0.51
itarian
0.50
stood
0.49
hardship
0.48
Activations Density 5.967%