INDEX
Explanations
negative or critical adjectives
negative descriptors and critique-related terms
New Auto-Interp
Negative Logits
Downloadha
-0.82
iler
-0.78
ologue
-0.73
udeb
-0.70
rongh
-0.69
»Ĵ
-0.69
opez
-0.68
roma
-0.68
hover
-0.67
onde
-0.67
POSITIVE LOGITS
enough
1.14
territory
0.84
insanity
0.84
storytelling
0.83
folly
0.82
nonsense
0.82
speak
0.81
coincidence
0.81
fodder
0.78
stupidity
0.78
Activations Density 0.359%