INDEX
Explanations
the word "mod" with varying degrees of strength
the presence of the word "mod" and related terms associated with moderation or modification
New Auto-Interp
Negative Logits
vana
-0.75
ibaba
-0.71
jriwal
-0.65
ï¸
-0.64
©¶æ
-0.63
ISA
-0.61
Flavoring
-0.61
Enlarge
-0.60
mble
-0.60
Adapt
-0.59
POSITIVE LOGITS
erella
0.75
opol
0.72
ooth
0.65
ighters
0.65
ail
0.61
oin
0.61
ruck
0.60
asty
0.60
nant
0.59
ude
0.58
Activations Density 0.158%