INDEX
Explanations
mentions of a particular person's name: "Mans"
references to the name "Mans" and variations of it
New Auto-Interp
Negative Logits
vernment
-0.77
é¾įåĸļ士
-0.72
ties
-0.70
WORK
-0.64
Matrix
-0.62
NEY
-0.61
Dialogue
-0.61
èª
-0.60
IENCE
-0.59
scill
-0.59
POSITIVE LOGITS
laughter
1.20
pread
1.04
chwitz
0.92
hap
0.90
auga
0.90
plain
0.87
sein
0.87
field
0.83
bridge
0.81
ouri
0.81
Activations Density 0.028%