INDEX
Explanations
words related to reasoning or cause and effect
instances of the word "thus" as a connector in sentences
New Auto-Interp
Negative Logits
Kl
-0.69
Ones
-0.64
Kelvin
-0.61
kick
-0.61
Polo
-0.59
Don
-0.59
ertodd
-0.59
MPH
-0.58
Coffee
-0.58
ropolitan
-0.58
POSITIVE LOGITS
forth
1.14
forward
0.85
bered
0.84
mia
0.84
mask
0.79
othe
0.76
far
0.75
guiActiveUn
0.74
aper
0.73
hiba
0.73
Activations Density 0.026%