INDEX
Explanations
phrases related to ideas, beliefs, opinions, and positions
statements of necessity or importance regarding a topic
New Auto-Interp
Negative Logits
iates
-0.76
ragon
-0.66
vet
-0.65
ravel
-0.65
angering
-0.63
illon
-0.62
happ
-0.61
quer
-0.61
ords
-0.61
Tanz
-0.60
POSITIVE LOGITS
namely
0.86
ãĤ¤ãĥĪ
0.75
Hey
0.71
disclaimer
0.71
"'
0.70
Hey
0.68
andum
0.66
falsehood
0.66
that
0.66
disbelief
0.66
Activations Density 0.471%