INDEX
Explanations
phrases that indicate possession or actions attributed to subjects
New Auto-Interp
Negative Logits
IPC
-0.15
407
-0.15
_have
-0.15
ีà¸ŀ
-0.14
Have
-0.14
have
-0.14
have
-0.14
aret
-0.14
HAVE
-0.14
321
-0.14
POSITIVE LOGITS
since
0.18
Glas
0.18
become
0.17
lat
0.15
imoto
0.15
velle
0.15
Ting
0.15
doi
0.15
viso
0.15
ding
0.15
Activations Density 0.305%