INDEX
Explanations
mentions of specific individuals or names, particularly the name "Jared" and variations thereof
New Auto-Interp
Negative Logits
ables
-0.16
Herbert
-0.15
iju
-0.15
erb
-0.14
jvu
-0.14
ie
-0.14
_:*
-0.14
isode
-0.14
stud
-0.13
blem
-0.13
POSITIVE LOGITS
//{{0.17
akin
0.16
اسة
0.15
飯åºĹ
0.15
unma
0.15
_decorator
0.14
'gc
0.14
âĸį
0.14
_USAGE
0.14
utzer
0.13
Activations Density 0.050%