INDEX
Explanations
references to having meals or dining with others
phrases indicating social interactions
New Auto-Interp
Negative Logits
mart
-0.78
©¶æ¥µ
-0.73
mite
-0.71
ghai
-0.68
fighter
-0.68
arily
-0.66
ovich
-0.63
nery
-0.63
æ©Ł
-0.62
sov
-0.62
POSITIVE LOGITS
regards
1.17
stood
1.08
impunity
1.08
regard
1.04
standing
1.01
draw
0.99
dolls
0.89
dignity
0.85
dolphins
0.82
holding
0.82
Activations Density 0.202%