INDEX
Explanations
the word "so", especially at the beginning of a sentence
phrases that express conditional statements or hypothetical scenarios
New Auto-Interp
Negative Logits
rals
-0.72
Faul
-0.61
Gaw
-0.59
burner
-0.58
Crack
-0.58
glances
-0.57
ر
-0.56
Flavoring
-0.55
ع
-0.53
flashbacks
-0.53
POSITIVE LOGITS
oths
1.16
apy
0.95
bered
0.91
oner
0.88
iled
0.85
zin
0.81
iling
0.81
oooo
0.80
othes
0.79
othe
0.79
Activations Density 0.065%