INDEX
Explanations
phrases indicating approval or praise
phrases indicating a positive assessment or state of being
New Auto-Interp
Negative Logits
hyde
-0.91
rush
-0.75
atto
-0.72
ategory
-0.71
hip
-0.71
ataka
-0.68
furiously
-0.66
illary
-0.64
iferation
-0.63
ngth
-0.63
POSITIVE LOGITS
enough
1.10
suited
1.00
enough
0.98
behaved
0.92
spring
0.91
baum
0.80
wired
0.76
Enough
0.76
positioned
0.76
Known
0.76
Activations Density 0.041%