INDEX
Explanations
the word "freak" or variations of it
the term "freak" and its variations
New Auto-Interp
Negative Logits
adr
-0.76
conduc
-0.71
ea
-0.71
undai
-0.70
ournal
-0.69
Liberties
-0.69
ript
-0.67
eger
-0.67
vette
-0.65
HAEL
-0.63
POSITIVE LOGITS
ishly
1.37
ously
1.07
onom
1.05
istically
0.97
show
0.96
ery
0.85
ish
0.84
ageddon
0.84
erers
0.84
iness
0.83
Activations Density 0.041%