INDEX
Explanations
concepts related to progress and achievement
New Auto-Interp
Negative Logits
ALLE
-0.15
ÏĦε
-0.15
mat
-0.14
vrd
-0.14
255
-0.13
Neutral
-0.13
Schl
-0.13
intense
-0.13
yc
-0.12
ẩm
-0.12
POSITIVE LOGITS
Exped
0.17
icÃŃ
0.16
rophe
0.15
inski
0.15
avanaugh
0.15
etat
0.15
inae
0.15
oice
0.14
би
0.14
andre
0.14
Activations Density 0.038%