INDEX
Explanations
Twitter usernames
underscore-prefixed usernames or handles
New Auto-Interp
Negative Logits
eteria
-0.76
Leilan
-0.73
quished
-0.71
atform
-0.69
Yose
-0.66
Manson
-0.65
blender
-0.63
reception
-0.62
screenings
-0.62
Galile
-0.62
POSITIVE LOGITS
ebook
1.04
blank
1.03
dict
0.99
tro
0.97
country
0.96
chance
0.94
dust
0.93
gradient
0.93
pill
0.93
page
0.92
Activations Density 0.019%