INDEX
Explanations
the word "being" in various contexts
New Auto-Interp
Negative Logits
меÑĤÑĮ
-0.17
lue
-0.15
-lnd
-0.15
empor
-0.15
ittest
-0.15
hire
-0.15
heimer
-0.15
нок
-0.15
archical
-0.15
Available
-0.14
POSITIVE LOGITS
ness
0.23
360
0.18
awan
0.17
held
0.17
apiro
0.16
eld
0.16
actively
0.16
groom
0.15
abouts
0.15
ing
0.15
Activations Density 0.026%