INDEX
Explanations
mentions of people, particularly in social contexts and interactions
New Auto-Interp
Negative Logits
Ñĥг
-0.16
queda
-0.15
flown
-0.15
Loved
-0.14
risen
-0.14
itten
-0.14
aret
-0.14
ád
-0.14
uchen
-0.14
IVEN
-0.14
POSITIVE LOGITS
was
0.33
was
0.28
_was
0.28
Was
0.25
Was
0.25
were
0.25
did
0.25
wasn
0.24
saw
0.23
yesterday
0.21
Activations Density 0.878%