INDEX
Explanations
words and phrases associated with measurements or evaluations
New Auto-Interp
Negative Logits
lyph
-0.16
eldre
-0.15
mts
-0.15
tw
-0.15
gba
-0.14
cus
-0.14
rch
-0.14
öy
-0.14
ascar
-0.14
asure
-0.14
POSITIVE LOGITS
Esper
0.32
Ä
0.31
esper
0.26
Ä
0.25
Å
0.23
estas
0.23
Å
0.22
kun
0.20
la
0.20
mall
0.20
Activations Density 0.001%