INDEX
Explanations
pranks or pranking-related words or phrases
terms related to pranks and mischievous activities
New Auto-Interp
Negative Logits
Twain
-0.81
farm
-0.79
ansas
-0.74
eva
-0.73
aeda
-0.73
Swe
-0.70
medi
-0.67
querque
-0.65
azy
-0.64
Abortion
-0.61
POSITIVE LOGITS
prank
0.95
iona
0.81
-+-+-+-+
0.79
func
0.74
PIN
0.71
mischief
0.70
Pry
0.70
obs
0.69
oun
0.66
orrow
0.66
Activations Density 0.008%