INDEX
Explanations
the word "I" as a way of identifying personal statements or subjective experiences
New Auto-Interp
Negative Logits
O
-0.15
avenport
-0.14
pher
-0.14
borough
-0.14
ses
-0.14
mont
-0.13
.Lib
-0.13
cca
-0.13
Du
-0.13
cco
-0.13
POSITIVE LOGITS
weg
0.17
šker
0.15
968
0.15
368
0.15
ubo
0.14
olina
0.14
adays
0.14
aklı
0.14
sled
0.14
iag
0.14
Activations Density 0.154%