INDEX
Explanations
specific actions or behaviors described in a narrative context
phrases indicating responses and inquiries in conversations
New Auto-Interp
Negative Logits
agate
-0.77
road
-0.75
isode
-0.75
zhou
-0.72
joice
-0.70
rift
-0.67
angular
-0.67
hazard
-0.65
hill
-0.64
hab
-0.63
POSITIVE LOGITS
himself
1.34
his
1.03
detractors
0.97
opponents
0.94
reporters
0.92
others
0.91
anyone
0.90
everyone
0.90
anybody
0.89
teammates
0.89
Activations Density 0.996%