INDEX
Explanations
topics related to personal identity and relationships
New Auto-Interp
Negative Logits
]--;
-0.65
<bos>
-0.64
which
-0.60
}\]
-0.60
ואת
-0.58
which
-0.57
والتي
-0.56
()
-0.55
SizeF
-0.55
}(
-0.55
POSITIVE LOGITS
FTW
1.12
ftw
1.12
?
1.03
anyone
0.91
galore
0.89
?!
0.83
indeed
0.81
=
0.79
huh
0.78
!
0.75
Activations Density 0.626%