INDEX
Explanations
phrases related to expressing meanings or intentions through words
statements conveying strong opinions or beliefs
New Auto-Interp
Negative Logits
runner
-0.71
ictive
-0.69
ogether
-0.66
edom
-0.65
uries
-0.65
estial
-0.65
tions
-0.64
urable
-0.64
ukong
-0.63
awaits
-0.63
POSITIVE LOGITS
implicitly
0.99
referring
0.97
implying
0.94
necessarily
0.83
tacit
0.78
infer
0.77
hypoc
0.77
kidding
0.76
unwittingly
0.75
hypocr
0.75
Activations Density 0.458%