INDEX
Explanations
the word "mean" or variations of it used to inquire about interpretation or significance
queries about the meaning or implications of statements
New Auto-Interp
Negative Logits
hma
-0.72
tan
-0.69
uci
-0.68
iencies
-0.67
ttes
-0.66
visors
-0.66
sweat
-0.63
visor
-0.63
thora
-0.63
mention
-0.63
POSITIVE LOGITS
exactly
0.78
psychologically
0.74
clinically
0.73
goodbye
0.71
GOODMAN
0.67
NEXT
0.66
today
0.66
anyway
0.64
.</
0.63
mechanically
0.63
Activations Density 0.032%