INDEX
Explanations
phrases indicating approval or positive evaluation
words that indicate a subjective or personal experience
New Auto-Interp
Negative Logits
chess
-0.68
Skydragon
-0.67
mathemat
-0.66
Alban
-0.65
pyramid
-0.65
trainers
-0.64
shelter
-0.64
mic
-0.63
ash
-0.63
Samar
-0.63
POSITIVE LOGITS
ve
1.24
should
1.20
felt
1.19
re
1.18
ought
1.15
sure
1.14
sent
1.13
tre
1.12
shall
1.11
been
1.11
Activations Density 0.301%