INDEX
Explanations
phrases related to communication or reporting
quoted speech and references to individuals making statements
New Auto-Interp
Negative Logits
grad
-0.64
Wiz
-0.62
Purple
-0.62
Vict
-0.61
âĶĢâĶĢâĶĢâĶĢ
-0.60
verts
-0.60
phys
-0.59
penet
-0.58
bedroom
-0.57
IVERS
-0.57
POSITIVE LOGITS
rite
0.73
arta
0.73
edom
0.70
pherd
0.70
oversaw
0.70
echoed
0.69
advised
0.69
oversees
0.68
itage
0.68
imon
0.67
Activations Density 0.525%