INDEX
Explanations
longer words with suffixes indicating a state or condition
terms related to abstract concepts and evaluation
New Auto-Interp
Negative Logits
enegger
-0.65
ogie
-0.62
sylv
-0.59
ouk
-0.56
Noon
-0.55
istani
-0.55
Bie
-0.54
razen
-0.53
Colomb
-0.53
Tid
-0.53
POSITIVE LOGITS
fallacy
0.82
(%)
0.79
¶
0.78
:=
0.73
Increases
0.73
iveness
0.70
Allows
0.70
lessly
0.67
doesnt
0.65
=
0.65
Activations Density 0.412%