INDEX
Explanations
words related to observation or analysis
phrases indicating perception or understanding
New Auto-Interp
Negative Logits
rontal
-0.76
rig
-0.68
Died
-0.66
Niet
-0.64
cu
-0.63
calling
-0.62
istries
-0.62
rang
-0.61
jab
-0.60
seless
-0.60
POSITIVE LOGITS
IDENT
0.76
ANC
0.72
arten
0.67
Tip
0.65
ANCE
0.63
udos
0.63
į
0.61
Footnote
0.60
Written
0.60
evidenced
0.59
Activations Density 0.064%