INDEX
Explanations
adjectives referring to quality or significance
adjectives and descriptive phrases that convey significant emotional or evaluative weight
New Auto-Interp
Negative Logits
aneers
-0.91
»Ĵ
-0.81
IDES
-0.80
ahs
-0.79
icons
-0.78
dor
-0.76
racuse
-0.76
endants
-0.74
viks
-0.74
onds
-0.72
POSITIVE LOGITS
territory
1.27
fodder
1.08
stuff
1.07
nonsense
1.04
enough
1.04
insanity
0.97
behavior
0.97
advice
0.94
folly
0.93
stupidity
0.93
Activations Density 0.306%