INDEX
Explanations
occurrences of the word "trip."
New Auto-Interp
Negative Logits
omething
-0.74
anguage
-0.73
IRD
-0.64
Newfoundland
-0.63
Canary
-0.62
Liu
-0.62
linem
-0.60
bodied
-0.59
TOM
-0.59
Shining
-0.59
POSITIVE LOGITS
ublic
1.05
edia
0.98
athi
0.96
oli
0.96
olitics
0.94
ython
0.91
olic
0.91
athy
0.89
kefeller
0.89
oly
0.88
Activations Density 0.005%