INDEX
Explanations
references to social interactions and relationships
New Auto-Interp
Negative Logits
".
-0.85
iſt
-0.77
فريبيس
-0.76
myſelf
-0.76
---+
-0.72
ſind
-0.72
poffible
-0.71
himſelf
-0.69
$_"
-0.69
quæ
-0.69
POSITIVE LOGITS
oh
1.67
Oh
1.64
Oh
1.53
oh
1.40
ah
1.33
Ah
1.29
Ah
1.28
Wow
1.27
wow
1.27
Oops
1.27
Activations Density 0.474%