INDEX
Explanations
words related to trust and betrayal in a confrontational or survival context
New Auto-Interp
Negative Logits
m
-0.44
la
-0.42
ante
-0.42
-0.42
Do
-0.42
chen
-0.41
Ignore
-0.41
i
-0.41
${-0.41
M
-0.40
POSITIVE LOGITS
للمعارف
1.16
للاسماء
0.95
تانيه
0.95
myſelf
0.88
Personensuche
0.88
ſche
0.82
themſelves
0.81
fhew
0.81
Numerade
0.79
poffe
0.79
Activations Density 0.010%