INDEX
Explanations
references to fandom or enthusiasm for various topics
mentions of fans or enthusiasts
New Auto-Interp
Negative Logits
ateral
-0.71
eneg
-0.68
akespe
-0.66
apeake
-0.65
Proceedings
-0.65
unfocusedRange
-0.65
giene
-0.61
Territ
-0.61
Nurs
-0.60
Lans
-0.60
POSITIVE LOGITS
atical
1.34
atics
1.15
atically
1.08
boys
0.99
club
0.95
fare
0.94
atic
0.92
boy
0.88
igans
0.87
hetical
0.84
Activations Density 0.020%