INDEX
Explanations
references to power, particularly in the context of authority, energy, or influence
New Auto-Interp
Negative Logits
orgen
-0.17
enerator
-0.17
berger
-0.15
Torrent
-0.14
fabs
-0.14
hd
-0.13
ĽĪ
-0.13
íĹĮ
-0.13
Hakk
-0.13
avou
-0.13
POSITIVE LOGITS
lier
0.17
agged
0.17
ORB
0.16
984
0.16
mtree
0.15
anted
0.15
eil
0.15
лÑĥг
0.15
erd
0.14
lek
0.14
Activations Density 0.027%