INDEX
Explanations
phrases related to qualities or features of things
phrases emphasizing the existence or presence of certain problems or characteristics
New Auto-Interp
Negative Logits
²¾
-0.90
opian
-0.78
arest
-0.76
anches
-0.74
iddles
-0.72
eli
-0.72
å§«
-0.72
isms
-0.71
Roads
-0.71
enders
-0.71
POSITIVE LOGITS
kind
1.47
type
1.40
sort
1.34
kinds
1.02
capability
1.02
exact
0.99
happen
0.98
trope
0.97
tactic
0.97
happening
0.96
Activations Density 0.188%