INDEX
Explanations
names or mentions of a specific person
variations of the word "take."
New Auto-Interp
Negative Logits
ãĥ£
-0.74
enegger
-0.69
oad
-0.67
Weasley
-0.63
ãĤ¡
-0.60
cfg
-0.59
aldi
-0.59
fired
-0.57
inatory
-0.56
swick
-0.55
POSITIVE LOGITS
Maker
0.81
warm
0.80
akes
0.78
ñ
0.78
aimon
0.77
asy
0.73
yon
0.73
velop
0.73
yi
0.72
maker
0.71
Activations Density 0.033%