INDEX
Explanations
phrases indicating familiarity and knowledge about someone or something
New Auto-Interp
Negative Logits
igo
-0.17
jure
-0.15
keley
-0.15
наÑĩе
-0.14
jte
-0.14
ega
-0.14
&W
-0.14
ickey
-0.14
laps
-0.14
iew
-0.13
POSITIVE LOGITS
grips
0.27
Gri
0.20
asty
0.19
know
0.18
bottom
0.17
PointXYZ
0.16
Bottom
0.16
BOTTOM
0.15
788
0.15
work
0.15
Activations Density 0.035%