INDEX
Explanations
mentions of being a fan of something or someone
references to fandom or being a fan of various subjects or interests
New Auto-Interp
Negative Logits
apeake
-0.74
muddy
-0.66
ENCY
-0.65
terday
-0.64
eneg
-0.62
ateral
-0.62
unfocusedRange
-0.61
dishonest
-0.59
dilig
-0.59
akespe
-0.59
POSITIVE LOGITS
atical
1.45
atics
1.17
fiction
1.11
atically
1.04
club
1.03
boys
1.00
artist
0.98
fare
0.97
boy
0.91
atic
0.91
Activations Density 0.023%