INDEX
Explanations
references to individuals and their actions or feelings, indicating a focus on characters and their interactions
New Auto-Interp
Negative Logits
ly
-0.16
bsp
-0.15
Clo
-0.15
iously
-0.14
enz
-0.14
öy
-0.14
comm
-0.14
ely
-0.14
aho
-0.14
arily
-0.13
POSITIVE LOGITS
oret
0.18
ведÑĮ
0.16
certainly
0.16
orem
0.15
inerary
0.15
alth
0.14
oretical
0.14
iag
0.14
zel
0.14
intros
0.14
Activations Density 0.563%