INDEX
Explanations
questions starting with 'what'
New Auto-Interp
Negative Logits
Interstitial
-0.81
DERR
-0.74
mens
-0.73
renheit
-0.70
apsed
-0.70
Said
-0.69
agos
-0.66
anus
-0.66
uffer
-0.65
ache
-0.64
POSITIVE LOGITS
?
0.95
?'"
0.90
?'
0.89
exactly
0.87
!?
0.85
?"
0.84
?".
0.84
?ãĢį
0.81
?",
0.81
happens
0.79
Activations Density 0.050%