INDEX
Explanations
references to specific actions or concepts related to possession and existence
New Auto-Interp
Negative Logits
lander
-0.18
yll
-0.15
/=
-0.15
hall
-0.15
tridge
-0.15
FUL
-0.14
hurst
-0.14
ure
-0.14
ripper
-0.14
yar
-0.14
POSITIVE LOGITS
ches
0.17
anki
0.16
godt
0.16
happening
0.16
ingles
0.15
asal
0.15
APPER
0.15
happen
0.14
clc
0.14
во
0.14
Activations Density 0.247%