INDEX
Explanations
locations or entities
special characters or symbols in the text
New Auto-Interp
Negative Logits
erest
-0.77
ried
-0.69
level
-0.66
therap
-0.66
lifes
-0.66
orts
-0.64
proced
-0.62
optimum
-0.62
ling
-0.62
odes
-0.62
POSITIVE LOGITS
perhaps
1.04
including
1.03
namely
1.02
meaning
0.98
––
0.97
something
0.88
they
0.87
along
0.85
particularly
0.85
even
0.84
Activations Density 0.188%