INDEX
Explanations
phrases enclosed in quotation marks
quoted phrases or dialogue within the text
New Auto-Interp
Negative Logits
Tanz
-0.75
kw
-0.75
pit
-0.68
boarding
-0.67
rieved
-0.66
kitchens
-0.66
whole
-0.64
Dull
-0.62
robe
-0.61
hunted
-0.59
POSITIVE LOGITS
Whilst
0.91
/"
0.87
Keynes
0.86
Firstly
0.79
Marx
0.76
"""
0.76
arta
0.75
pler
0.75
netflix
0.72
ablishment
0.72
Activations Density 0.029%