INDEX
Explanations
certain words or phrases enclosed in quotation marks
quoted phrases or dialogue
New Auto-Interp
Negative Logits
killers
-0.77
adjud
-0.74
cancell
-0.74
tabloid
-0.72
hypothal
-0.70
bases
-0.70
deterrent
-0.70
arri
-0.69
vigil
-0.68
buoy
-0.68
POSITIVE LOGITS
true
1.23
yes
1.19
false
1.19
personal
1.14
someone
1.13
nothing
1.08
human
1.07
fuck
1.06
everything
1.06
every
1.05
Activations Density 0.093%