INDEX
Explanations
instances of the word "well"
repeated use of the word "well."
New Auto-Interp
Negative Logits
hyde
-0.88
atto
-0.78
ategory
-0.76
hip
-0.72
rush
-0.70
ataka
-0.69
furiously
-0.65
omore
-0.65
amera
-0.65
iferation
-0.64
POSITIVE LOGITS
enough
1.10
enough
0.93
spring
0.90
suited
0.90
baum
0.79
reads
0.77
Known
0.76
Enough
0.76
behaved
0.75
played
0.73
Activations Density 0.039%