INDEX
Explanations
mentions of travel or exploration
words related to dialogue or conversation
New Auto-Interp
Negative Logits
Ded
-0.77
cracked
-0.72
lat
-0.70
Inf
-0.67
Dil
-0.64
Gener
-0.62
Ram
-0.62
Wid
-0.61
adam
-0.61
Jet
-0.60
POSITIVE LOGITS
ogue
4.96
ogy
1.23
ogle
1.20
itton
1.08
iferation
1.07
og
1.06
fashion
0.96
ength
0.94
oire
0.92
odge
0.91
Activations Density 0.017%