INDEX
Explanations
phrases related to providing information or explanations
phrases related to communication and reporting of statements
New Auto-Interp
Negative Logits
itton
-0.85
audi
-0.67
Discussion
-0.66
ranch
-0.66
zzi
-0.63
assic
-0.61
atti
-0.61
Tes
-0.61
absor
-0.60
rather
-0.60
POSITIVE LOGITS
anymore
1.19
nor
1.14
specifics
1.01
anything
1.01
acknow
0.97
whatsoever
0.87
any
0.81
anywhere
0.77
until
0.76
anybody
0.75
Activations Density 0.175%