INDEX
Explanations
"<end>`" symbols of various contexts
special characters or symbols, particularly variations of 'â' and 'ł'
New Auto-Interp
Negative Logits
reflex
-0.71
pleasure
-0.65
condem
-0.64
fuck
-0.64
Lama
-0.64
indo
-0.63
asshole
-0.62
hung
-0.62
Sammy
-0.62
donkey
-0.62
POSITIVE LOGITS
ï¸ı
0.99
ternity
0.85
uthor
0.85
denotes
0.78
_>
0.78
STEM
0.75
pecially
0.72
ãĥ´ãĤ¡
0.71
there
0.70
ACP
0.69
Activations Density 0.245%