INDEX
Explanations
negative or unusual phenomena
New Auto-Interp
Negative Logits
(
1.92
(
1.77
(~
1.71
([
1.52
”(
1.51
(
1.51
(
1.44
”(
1.43
(\
1.42
(&
1.40
POSITIVE LOGITS
faction
0.78
abortions
0.68
fucked
0.65
dearth
0.63
bitch
0.63
parity
0.63
grotesque
0.62
Qaeda
0.62
Fuck
0.61
দোষ
0.60
Activations Density 0.118%