INDEX
Explanations
words related to actions or processes
references to actions, dimensions, and functional characteristics
New Auto-Interp
Negative Logits
aren
-0.76
Torn
-0.66
letes
-0.65
sers
-0.64
Clintons
-0.64
illin
-0.64
regn
-0.62
oute
-0.61
ominated
-0.60
zzy
-0.59
POSITIVE LOGITS
guise
0.98
form
0.80
manner
0.80
mode
0.78
fashion
0.77
shape
0.72
ividual
0.70
vicinity
0.65
proportions
0.64
staging
0.64
Activations Density 0.246%