INDEX
Explanations
phrases related to principles or concepts
phrases that discuss theoretical concepts or principles
New Auto-Interp
Negative Logits
Clicker
-0.72
Surge
-0.69
r
-0.63
æŃ¦
-0.63
Moore
-0.62
Stain
-0.61
vy
-0.61
rir
-0.60
vic
-0.59
ollar
-0.59
POSITIVE LOGITS
theoretically
0.86
istically
0.75
pport
0.72
adem
0.71
icably
0.69
ually
0.68
guise
0.67
istic
0.66
udeau
0.64
idence
0.64
Activations Density 0.048%