INDEX
Explanations
the word "ton" with high activation values
the repeated mention of the word "ton."
New Auto-Interp
Negative Logits
Phant
-0.71
Craft
-0.70
Danger
-0.67
Cust
-0.67
ELD
-0.67
Constructed
-0.67
Clinic
-0.63
Journal
-0.61
Forever
-0.61
pring
-0.60
POSITIVE LOGITS
neau
1.05
ne
0.92
arg
0.90
Ton
0.89
aton
0.89
ights
0.87
umen
0.87
eful
0.86
earthqu
0.86
icum
0.85
Activations Density 0.018%