INDEX
Explanations
names with the word "Fi" in them
mentions of a specific individual or character
New Auto-Interp
Negative Logits
eers
-0.98
manship
-0.81
stage
-0.73
xual
-0.70
tons
-0.69
*/(
-0.68
eer
-0.68
tenance
-0.66
Dispatch
-0.65
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.64
POSITIVE LOGITS
ennes
1.13
Fi
1.06
anca
0.96
andom
0.95
endish
0.93
ornia
0.92
Fi
0.89
oms
0.89
oti
0.89
endi
0.86
Activations Density 0.010%