INDEX
Explanations
phrases indicating certainty or confidence
phrases indicating certainty or inevitability
New Auto-Interp
Negative Logits
ufact
-0.77
vernment
-0.75
edia
-0.66
intosh
-0.64
gdala
-0.63
annis
-0.61
INT
-0.59
olid
-0.58
asus
-0.58
akespeare
-0.57
POSITIVE LOGITS
ties
1.13
fire
0.98
footed
0.95
ty
0.94
sk
0.89
faced
0.86
stre
0.73
stall
0.72
blade
0.71
ples
0.71
Activations Density 0.032%