INDEX
Explanations
the word "Paris" in various contexts
instances of a specific suffix or word pattern
New Auto-Interp
Negative Logits
testimony
-0.68
blast
-0.64
cl
-0.62
dating
-0.61
plastic
-0.59
Radiation
-0.59
Plastic
-0.59
Dinosaur
-0.58
compromising
-0.57
firing
-0.57
POSITIVE LOGITS
ais
4.83
ois
1.26
ai
1.21
alis
1.13
aan
1.06
oir
1.02
aire
1.00
neau
1.00
aid
0.99
aque
0.98
Activations Density 0.012%