INDEX
Explanations
phrases indicating personal opinions or beliefs
the repeated phrase "I think that."
New Auto-Interp
Negative Logits
hips
-0.71
istance
-0.68
ãĥ©ãĥ³
-0.64
EMBER
-0.63
Guard
-0.62
ãĥīãĥ©
-0.61
aq
-0.60
Directions
-0.60
orah
-0.59
IELD
-0.59
POSITIVE LOGITS
cher
0.79
justifies
0.77
contradicts
0.73
izoph
0.73
translates
0.70
applies
0.66
sounds
0.66
settles
0.66
sounded
0.64
undermines
0.63
Activations Density 0.259%