INDEX
Explanations
phrases related to political, social, and gaming contexts
expressions emphasizing personal struggle or moral complexity
New Auto-Interp
Negative Logits
Recovery
-0.74
Initialized
-0.73
rica
-0.71
ç¥ŀ
-0.68
ometry
-0.67
ode
-0.65
Compact
-0.64
Telescope
-0.63
McDonnell
-0.62
estone
-0.61
POSITIVE LOGITS
sam
0.73
rals
0.69
GAN
0.69
fuckin
0.68
folk
0.67
*/(
0.67
prin
0.66
cousin
0.64
rat
0.64
freak
0.63
Activations Density 0.258%