INDEX
Explanations
names of people and locations
New Auto-Interp
Negative Logits
SOURCE
-0.73
MSG
-0.68
envy
-0.61
pleasure
-0.61
FontSize
-0.60
Pokemon
-0.60
Californ
-0.60
Beacon
-0.59
fuck
-0.59
menace
-0.59
POSITIVE LOGITS
oub
1.07
utsch
1.04
ymes
0.99
ollah
0.99
abis
0.96
iaz
0.92
erman
0.92
ij
0.92
alez
0.91
iani
0.91
Activations Density 0.111%