INDEX
Explanations
references to social interactions and dining settings
dining out and about
New Auto-Interp
Negative Logits
n
-0.31
sa
-0.31
ngths
-0.29
uib
-0.29
ko
-0.28
ко
-0.28
ad
-0.28
Tübingen
-0.28
一
-0.28
↵↵
-0.28
POSITIVE LOGITS
iſchen
0.68
صوتيه
0.63
ſicht
0.63
ſſung
0.62
iſche
0.61
<unused41>
0.61
<unused23>
0.61
<unused28>
0.61
<unused14>
0.61
<unused8>
0.61
Activations Density 0.142%