INDEX
Explanations
adjectives or descriptions related to opinions and evaluations
words or phrases that indicate triviality or low seriousness
New Auto-Interp
Negative Logits
onds
-0.80
onding
-0.61
ioxide
-0.59
clerosis
-0.59
usalem
-0.59
stake
-0.59
xit
-0.58
asio
-0.55
counselor
-0.55
OND
-0.54
POSITIVE LOGITS
ingly
1.01
istic
0.98
istically
0.97
compared
0.95
ly
0.95
sounding
0.87
enough
0.86
insofar
0.82
lly
0.81
ified
0.81
Activations Density 0.285%