INDEX
Explanations
references to digital media phenomena like GIFs and internet memes
New Auto-Interp
Negative Logits
cade
-0.72
fur
-0.72
maps
-0.71
pots
-0.69
Unsure
-0.69
bats
-0.68
daq
-0.67
©¶æ¥µ
-0.67
notations
-0.67
arnaev
-0.66
POSITIVE LOGITS
sorts
1.37
desperation
1.10
ignorance
1.08
stupidity
0.95
coward
0.92
humanity
0.91
imagination
0.89
folly
0.89
colonialism
0.88
laz
0.87
Activations Density 0.243%