INDEX
Explanations
references to the concept of "magic" in various contexts
New Auto-Interp
Negative Logits
akeru
-0.78
arers
-0.76
aeper
-0.74
bis
-0.74
Fas
-0.71
rontal
-0.70
fam
-0.70
Filename
-0.70
upon
-0.70
alis
-0.67
POSITIVE LOGITS
wand
0.98
tricks
0.97
realism
0.90
lantern
0.87
mushrooms
0.84
carpet
0.84
ãĥĥãĤ¯
0.83
potion
0.82
beans
0.81
trick
0.79
Activations Density 0.008%