INDEX
Explanations
references to people or entities frequently associated with the letter 's' in their names
New Auto-Interp
Negative Logits
citoy
-0.63
Avez
-0.63
humer
-0.61
Πολ
-0.61
aveug
-0.60
waypoints
-0.60
kater
-0.60
ầng
-0.59
uParam
-0.58
entanto
-0.58
POSITIVE LOGITS
s
1.49
".
0.98
’)
0.98
"])
0.98
0.96
s
0.93
’,
0.91
'))
0.91
”]
0.89
'].'
0.88
Activations Density 0.207%