INDEX
Explanations
pronouns and expressions indicating self-reference or identity
New Auto-Interp
Negative Logits
seamnă
-0.62
kasarigan
-0.59
htë
-0.50
îné
-0.46
kimi
-0.46
belki
-0.46
mogat
-0.45
Incoming
-0.45
doros
-0.43
Incoming
-0.42
POSITIVE LOGITS
')],
0.80
protoimpl
0.76
tagHelperRunner
0.75
}")
0.75
ddelweddau
0.73
"))
0.72
BorderRadius
0.72
'))
0.72
')}
0.72
""")
0.72
Activations Density 0.133%